Workbench Add-On

home *** CD-ROM | disk | FTP | other *** search

/ Workbench Add-On / Workbench Add-On - Volume 1.iso / Dev / Oberon / texts / TechNotes.Text < prev

Wrap

Text File | 1994-05-12 | 53KB | 1,360 lines

Departement Informatik Institut für Computersysteme Eidgenössische Technische Hochschule Zürich ---------------------------------------------------------------------------- Oberon Technical Notes Cuno Pfister (ed.) The purpose of the Oberon technical notes is to provide the implementor of an Oberon system with the experience gained during the implementation efforts undertaken at the Institut für Computersysteme at ETH. Furthermore they give an overview over work already done or under way. This report contains the first five technical notes. Table of Contents 1. Oberon Implementations page 28 2. An Integrated Heap Allocator/Garbage Collector page 30 3. Type Guards and Type Tests page 40 4. A Symmetric Solution to the Load/Store Problem page 42 5. Garbage Collection on Open Arrays page 48 ---------------------------------------------------------------------------- 1. Oberon Implementations Cuno Pfister The original Oberon implementation [1] has been realized by N. Wirth and J. Gutknecht for the Ceres workstation [2]. Several ports of the system to other machines have been completed since then. We will give a short description of each of those projects. Differences to the original implementations are described. Ceres (National Semiconductor NS32x32) The original implementation. Oberon is the basic operating system. Module Display is written in assembly language. The garbage collector is a mark-and-sweep garbage collector which runs only between commands, i.e. it does not need to handle pointers on the stack. An access to a freed module results in a trap on the Ceres-1 and Ceres-2, but goes undetected on the Ceres-3. The heap allocator/garbage collector is written in assembly language and linked together with the inner core modules. Sun SPARCstation (Sun SPARC) This implementation [3] runs on top of SunOS as a Unix process. Oberon takes over the whole screen. The display operations are based on the SUN Pixrect routines. Oberon files are mapped to Unix files. A variant of the buddy system strategy is used for heap allocation. The garbage collector is a mark-and-sweep garbage collector which usually runs between commands, but when NEW fails due to a memory shortage, the garbage collector is started also. This collector also takes pointers on the stack into account. This is done by treating each memory location on the stack as a possible pointer value. To find out whether a memory word is a pointer, it is subjected to some plausibility tests, such as testing whether the value points to a possible block address in the heap and whether a possibly valid type tag is found there. If the plausibility tests succeed, the heap is traversed sequentially to find the corresponding block. If the block is found, the tested memory location is treated as a root for the garbage collector. Modules are never freed but merely removed from the module list, i.e. an access to a freed module cannot be detected and aborted. The loader and heap allocator/garbage collector are written in Modula-2 and linked as a Unix application. Apple Macintosh II (Motorola MC68020) This implementation [4, 5] runs on top of the MacOS as a (MultiFinder friendly) application. Oberon runs in one Macintosh window. The display operations are largely based on the Apple QuickDraw routines. Oberon files are mapped to Macintosh files. An integrated allocator/collector is used (see technical note # 2). Modules are never freed but merely removed from the module list, i.e. an access to a freed module cannot be detected and aborted. The type descriptors contain information about procedure variables in records, such that a checked version of the System.Free command could be implemented in the future. The loader, heap allocator/garbage collector and some raster operations are written in assembly language and linked as a Macintosh application. DEC DECstation (MIPS R2000) This implementation runs on top of Ultrix as a process. Oberon runs in an X-window. The display operations are based on X-windows. Oberon files are mapped to Ultrix files. An integrated allocator/collector is used (see technical note # 2). Modules are never freed but merely removed from the module list, i.e. an access to a freed module cannot be detected and aborted. The loader is written in C and linked as a Unix application. IBM S/6000 (IBM S/6000) This project has recently been started. IBM PS/2 (Intel 80386) This project has recently been started. Others Other Oberon compiler back-ends have been written by students, but the system was not ported. The compiler back-ends available produce code for the following processors: - a virtual stack machine (similar to P-Code or M-Code) - Intel 8086 - Intel 80386 - INMOS T800 Transputer - C language (Oberon subset to C translator, for bootstrapping a compiler) References 1. Wirth N, Gutknecht J (1989), The Oberon System, Software-Practice and Experience, 19 (9), 857-893 2. Eberle H (1987), Development and Analysis of a Workstation Computer, Ph. D. thesis no. 8431, ETH Zürich 3. Templ J (1990), SPARC-Oberon, User's Guide and Implementation, Report 133, ETH Zürich 4. Franz M (1990), The Implementation of MacOberon, Report 141, ETH Zürich 5. Franz M (1990), MacOberon Reference Manual, Report 142, ETH Zürich ---------------------------------------------------------------------------- 2. An Integrated Heap Allocator/Garbage Collector Beat Heeb, Cuno Pfister Abstract Heap Allocation and Garbage Collection are fundamental services of an Oberon implementation. It is shown how a simple and efficient implementation of these services can be attained. Introduction Programs for personal computers become increasingly loaded with features, most of them rarely needed. The reason for that is, apart from the marketing pressure to advertise more features than the competition, the desire to provide all the features that anyone might ever need or want. The result is that programs become ever larger, more complex, buggier, more expensive and sometimes delivered years after their announcements. Even then they often fail to provide features useful for a particular task. A solution to this problem lies in the development of extensible programs. Extensibility here means that a program can provide the customer with only the features needed most of the time and with some means to extend it. If the customer needs some special service, he (or usually a third party) can implement this service himself, without having access to the original program's source code. For example, imagine an extensible page layout program which supports text boxes, draw boxes and bitmap boxes as standard box types, and commands to operate upon them. A user may have special needs concerning the available commands, like e.g. a command which aligns the selected objects in a document in a special way. It should be possible for him to write such a command, which operates on the exported document data structure. This poses a subtle problem, though. The command implementor may generate references, i.e. pointers, to an exported data structure, and these references are not known to the basic layout program. This means in particular that when the layout program disposes of the storage used by a document, there may still be pointers around which reference this storage. Such pointers are called dangling pointers . Dangling pointers are one of the most frequent and most dangerous sources of program malfunctions. Their use often results in the destruction of data not belonging to the erroneous module, and the destruction may not be detected for a long time. Such an error is difficult to track down, consequently it is difficult to determine who is responsible for an accident caused by the error. So extensibility leads to a loss of control over references and thereby to an increased probability for dangling pointers. We will come back to that shortly. The user of a program like the one described above should also be able to write an extension that supports special table boxes, for instance. Such an extension consists of a module which implements the data type Table, together with the particular behaviour of an instance of this type, like displaying, storing and reading itself. A document might then contain text boxes together with table boxes at the same time. This example is typical in that a new data type, which is similar to existing types, is introduced, and that variables of similar data types are integrated in the same data structure (the document). The object-oriented programming style is well suited for the implementation of such a program, since it allows the definition of similar, i.e. compatible types (e.g. Table is a subtype of Box) and since control can be delegated to the subtype by means of the overriding facility (e.g. Table implements its own Draw method). For our discussion the terms record and object can be used interchangeably. Our example has shown the integration of different objects types, potentially implemented by different programmers, in the same data structure. Close integration is obviously useful. On the other hand, it magnifies the problems associated with malfunctioning objects by making the integrity of a data structure dependant on a potentially large number of object implementations. Especially errors which have non-local effects are dangerous. This leads us to the conclusion that while extensibility makes dangling pointers more probable, integration of extensions makes the effects of dangling pointers graver. It is possible to prevent this type of errors going undetected by using a type-safe language. To prevent dangling pointers, the implementation of a type-safe language must guarantee that every pointer variable is initialized correctly and that a heap record is only released when there are no pointers referencing it anymore. The latter task is performed by a garbage collector. When a garbage collector can prove that a heap record is not referenced anymore, it reclaims the corresponding heap area for the storage allocator. Automatic garbage collection is known for a long time, but has not found its way into production languages like Pascal or C, not even into their object-oriented descendants. A garbage collector may reduce the response time and the performance of a program dramatically, or may require memory sizes several times as large as would be the case without a garbage collector. Additional reasons why garbage collectors are not popular are the problems caused by the lack of type-safety in the mentioned languages, e.g. posed by untagged variant records, and the complexity involved in handling arbitrary record types. We want to show how a simple and efficient garbage collector for Oberon [1] can be written. Oberon is a type-safe language derived from Modula-2 which allows for an object-oriented programming style. Oberon is also the name of an operating system and window system [2]. The kernel of an Oberon system provides, among other things, a storage allocator and a garbage collector. Oberon has first been implemented on the Ceres workstation [3]. Our approach is not a radical departure from the Ceres implementation, but rather a refinement which improves upon memory utilization and implementation complexity. Heap Allocation A simple storage allocation algorithm is presented. Small Blocks Storage is allocated in blocks. Block sizes are multiples of a minimal size B . The size s of an allocated block is at least the size of variables of the record type bound to the pointer type of p . This size is known during compilation. For the allocation it is rounded up to the next multiple of B . There is a free list for all supported block sizes, i.e. free blocks of the same size are linked by a simple linear list. The free lists are anchored in a global array A (which could be declared as ARRAY 1..N OF ADDRESS ) with element number i corresponding to the free list for size i * B. Allocation of a block of size s consists in removing a block from free list A[k] where (k = s / B) (i: s / B i < k: A[i] = NIL) (A[k] NIL). If k = s / B then the address of the block is returned. If k > s / B then the block is split into two blocks, the first of size s and the second of size r = k * B - s . The second block is inserted in the free list A[r / B] and the address of the first block is returned. The Ceres implementation is different in that it uses blocks with sizes restricted to powers of two. This leads to an increased internal fragmentation of the memory. In our scheme, each heap variable wastes less than half of the minimal block size. This in contrast to a waste of less than half of the particular block size. The most notable other difference is that our scheme is simpler to implement. Large Blocks It is reasonable to restrict the size of array A such that it supports only small blocks (e.g. less than about 100 bytes). Almost all allocations are done with small block sizes, thus a less efficient allocation strategy can be used for large blocks. A very simple solution is to use A[N] as a free list for blocks of variable size (i.e. size³ (N + j) * B ) and performing a first fit allocation [4] whenever this list must be used. Note that the support for this special case fits naturally into the allocation procedure, it merely adds one line of code. The following pseudo-code listing shows the complete allocation routine: procedure Allocate(var a:address; size:longint); var i:integer; r, l:address; begin i := min(size / B, N); (* calculate index and restrict it to a maximal value *) while (i < N) & (A[i] = NIL) do INC(i) end; (* search smallest non-empty free list *) l := adr (A[i]); a := l^; (* address and value of pointer to first free block *) while (a # NIL) & (a^.size < size) do l := ADR(a^.next); a := l^ end; (* first fit if i = N *) if a # nil then l^ := a^.next; (* remove block from free list *) if a^.size > size then (* block must be split *) i := min ((a^.size - size) / B, N); r := a + size; r^.size := a^.size - size; (* adjust size of residual block *) r^.next := A[i]; A[i] := r (* insert residual block in free list *) end end end Allocate; This algorithm doesn't support fast arbitrary block deallocation, because a freed block cannot efficiently be merged with its lower-address neighbour and because only simple linear lists are used for the free lists, which prohibits fast removal of a block from its free list. In the next chapter we will show how a garbage collector circumvents the need for fast arbitrary block deallocation. Garbage Collection The Ceres implementation of Oberon uses a mark-and-sweep garbage collector [5] for heap storage reclamation. In Oberon, most temporary variables are local and therefore allocated and deallocated on the stack. Thus relatively little garbage is produced compared to typical Lisp or Smalltalk systems. This explains why the Ceres garbage collector proved adequate in practice, contradicting the statement that "Mark-and-sweep automatic storage reclamation does not seem to be practical on contemporary (1988) computers" [6]. A mark-and-sweep collector works in two phases. In the mark phase, all objects which still can be referenced are marked. In the sweep phase, all heap blocks are traversed sequentially. The ones which have not been marked are reclaimed. Mark Phase Let us first consider a simple recursive procedure, which marks all objects reachable from a given pointer: procedure Mark(q:address); var off:address; begin if (q # NIL) & (Unmarked(q)) then SetMark(q); off := FirstPointerOffset(q); while off >= 0 do Mark(mem[q + off]); off := NextPointerOffset(q, off) end end end Mark; This pseudo-code procedure uses four auxiliary procedures. The procedure FirstPointerOffset determines the record field which contains the first pointer. The procedure NextPointerOffset repeatedly yields the next record field containing a pointer. A negative offset is used as a terminating sentinel. To implement FirstPointerOffset and NextPointerOffset it must be possible to efficiently find out the offsets of a record's pointer fields. These offsets are the same for all variables of this record's type. Thus it is reasonable to provide a so-called type descriptor containing a table with all these offset values. Every heap record now needs a pointer to this type descriptor. This pointer is a hidden record field called a type tag and is usually located at offset -PtrSize in the record. The allocation procedure is extended such that it also initializes this tag. (For every record type, the compiler reserves a global variable anchoring the type descriptor. The contents of the appropriate variable is passed to the allocation routine as an additional parameter.) Figure 1: Example of a record variable and its type descriptor Figure 1 shows the descriptor of a type T . It contains the fields size and ptable . ptable is a table of pointer offsets describing where in a variable of type T a pointer can be found. This table, which varies from type to type, is terminated by a negative valued sentinel. The procedure Unmarked tests whether an object has already been marked, the procedure SetMark marks an object under the assumption that it is unmarked (i.e. Unmarked(q) is a precondition of SetMark(q) ). We won't go into details about how marking is realized. It should be sufficient to say that one bit of the type tag can be used for marking, usually either the sign bit or the least significant bit. The following version of the above procedure replaces FirstPointerOffset and NextPointerOffset by an increment of the type tag by the size of a pointer: procedure Mark(q:address); begin if (q # NIL) & (Unmarked(q)) then SetMark(q); Increment(Tag(q)); while mem[Tag(q)] >= 0 do Mark(mem[q + mem[Tag(q)]]); Increment(Tag(q)) end; RestoreTag(q) end end Mark; This means that the type tag of a record changes during the mark phase such that it always points to the offset to be processed and after the offsets already processed (see Figure 2). Figure 2. Type Tag during a Mark Phase The loop over the offsets terminates when a negative offset is found. The value of this offset can be initialized such that RestoreTag simply becomes Tag(q) := Tag(q) + mem[Tag(q)]. We now transform this procedure such that the guard (q # NIL) & (Unmarked(q)) is moved outside of the mark procedure. procedure Mark(q:address); var r:address; begin (* (q # NIL) & JustMarked(q) *) loop Increment(Tag(q)); if mem[Tag(q)] >= 0 then r := mem[q + mem[Tag(q)]]; if (r # NIL) & (Unmarked(r) then SetMark(r); Mark(r) end else RestoreTag(q); return end end end Mark; JustMarked means that the object is marked, but none of its descendants. To eliminate recursion, we introduce an explicit stack. The recursive call is replaced by Push(q); q := r and the RETURN is replaced by Pop(q): procedure Mark(q:address); var r:address; begin Stack := Empty; loop Increment(Tag(q)); if mem[Tag(q)] >= 0 then r := mem[q + mem[Tag(q)]]; if (r # NIL) & Unmarked(r) then SetMark(r); Push(q); q := r end else RestoreTag(q); if Stack = Empty then exit else Pop(q) end end end end Mark; The drawback of this procedure is the use of an additional stack, i.e. of additional memory. In the algorithm of Deutsch/Schorr/Waite [7], the stack is distributed to the individual pointer locations which are being traversed. We use a variable p as stack pointer, i.e. as pointer to the object containing the predecessor. The predecessor is contained in one of the pointer fields, namely the one currently being processed. The old value of this pointer field is either held in the auxiliary variable r or on the stack also. This leads to the following replacements: Stack = Empty -> p = NIL Push(q) -> mem[q + mem[Tag(q)]] := p; p := q Pop(q) -> a := p + mem[Tag[p]]; r := mem[a]; mem[a] := q; q := p; p := r procedure Mark(q:address); var p, r, a:address; begin p := NIL; loop Increment(Tag(q)); if mem[Tag(q)] >= 0 then r := mem[q + mem[Tag(q)]]; if (r # NIL) & Unmarked(r) then SetMark(r); mem[q + mem[Tag(q)]] := p; p := q; q := r end else RestoreTag(q); if p = NIL then exit else a := p + mem[Tag[p]]; r := mem[a]; mem[a] := q; q := p; p := r end end end end Mark; In the appendix there is a listing of an actual implementation of the mark procedure written in MC68000 assembly language. This implementation also takes into consideration that additional data (which is not important for our discussion) must be stored in the type descriptor, between the size field and ptable . Concerning the type descriptors we should add that they may be treated just as any other records. Thus they need their own type descriptors. These meta type descriptors differ only in their size fields. They in turn can share a common meta meta type descriptor, whose tag points back to itself, i.e. it is its own type descriptor. It may be more practical though to mark type descriptors in some way and to treat them as special cases. Sweep Phase In the sweep phase the heap is traversed sequentially, block by block. To do that, the size of all blocks must be known. The sum of a block's address and its size yields the address of the next block. Since there is a type tag at the beginning of each allocated block, such a block's size can be found by inspecting the type descriptor to which the tag points. A free block can be treated the same way, with the difference that it is its own "type descriptor". How this can be done is shown in Figure 3. Figure 3: Structure of a free block At the beginning of the sweep phase, the free lists are all cleared. The sweep constructs completely new free lists by treating consecutive unmarked blocks as single large blocks and inserting them in the appropriate free lists. All marks are cleared. procedure Scan; var p: address; begin A[1..N] := NIL; q := HeapStart; repeat while (q # MemSize) & Marked(q) do ResetMark(q); q := q + mem[mem[q]] end; if q # MemSize then p := q; repeat q := q + mem[mem[q]] until (q = MemSize) or Marked(q); Insert(p, q - p) end until q = MemSize end Scan; The procedure Insert(p, s) inserts a free block at address p in the free list for size s . The invariant over this "merge-sweep" is that all free blocks that have been traversed already are of maximal size, i.e. merged. Only the block visited most recently might have to be merged with the next one. This invariant is a principal difference between the merge-sweep deallocation and the deallocation of arbitrary blocks. Acknowledgements We would like to thank H. Mössenböck, N. Wirth, R. Griesemer and W. Weck. References 1. Wirth N (1988) The Programming Language Oberon. Software-Practice and Experience, 18 (7), 661-670 2. Wirth N, Gutknecht J (1989) The Oberon System. Software-Practice and Experience, 19 (9), 857-893 3. Eberle H (1987) Development and Analysis of a Workstation Computer, Ph. D. thesis no. 8431, ETH Zürich 4. Knuth D (1973) The Art of Computer Programming, Addison-Wesley 5. McCarthy J (1960) Recursive Functions of Symbolic Expressions and Their Computation by Machine, I, Comm. ACM, 3, 184-195 6. Ungar D, Jackson F (1988), Tenuring Policies for Generation-Based Storage Reclamation, OOPSLA `88 Proceedings, 107-118 7. Schorr H, and Waite W (1967), An efficient machine-independent procedure for garbage collection in various list structures, Comm. ACM, 10 (8), 501-505 Appendix The following listing shows an implementation of the mark phase for one root pointer in MC68000 assembly language. A complete implementation would additionally have to iterate over all global (and possibly all local) pointer variables as roots. Note that the mark bit in the type tag is set during the whole traversal of a record, thus it can be statically compensated for by using an offset of -1 when addressing relative to the type tag. * 68000 mark phase for garbage collector * A0: pointer to father * A1: pointer to node * A2: temporary, for pointer rotation * A3: tag or pointer to current pointer offset * pointer offsets are usually accessed via A3 with an offset of * Offset(ptable) - 4 - 1. * The pointer is incremented before it is accessed, thus the subtraction * of PtrSize. * The subtraction of 1 comes from the set mark bit (bit # 0). * D0: offset * D1: temporary PtrSize EQU 4 ;size of pointers and offsets Tag EQU -4 ;offset of type tag TagL EQU Tag+3 ;low byte of type tag Mark EQU 0 ;mark bit (in TagL) PTab EQU 36 ;ptable offset Offset EQU PTab-PtrSize-1 Start: MOVE.L A1,D1 ; NIL test BEQ End ; NIL BSET.B #Mark,TagL(A1) ; test and set mark bit BNE End ; marked MOVE.L #0,A0 ; father := NIL MOVE.L Tag(A1),A3 ; load first tag BRA Loop Up: ADD.L D0,A3 ; adjust tag MOVE.L A3,Tag(A1) ; save tag MOVE.L A0,D1 ; NIL test BEQ End ; father = NIL (sentinel) MOVE.L Tag(A0),A3 ; load father.tag MOVE.L Offset(A3),D0 ; load offset MOVE.L (A0,D0),A2 ; rotate pointers, step 1 MOVE.L A1,(A0,D0) ; rotate pointers, step 2 MOVE.L A0,A1 ; rotate pointers, step 3 MOVE.L A2,A0 ; rotate pointers, step 4 Loop: ADDQ.L #PtrSize,A3 ; address of next offset MOVE.L Offset(A3),D0 ; load next offset BMI Up ; negative sentinel reached, i.e. end of ; list MOVE.L (A1,D0),A2 ; load son (and rotate pointers, step 1) MOVE.L A2,D1 ; NIL test BEQ Loop ; NIL BSET.B #Mark,TagL(A2) ; test and set mark bit BNE Loop ; marked Down: MOVE.L A3,Tag(A1) ; save tag MOVE.L A0,(A1,D0) ; rotate pointers, step 2 MOVE.L A1,A0 ; rotate pointers, step 3 MOVE.L A2,A1 ; rotate pointers, step 4 MOVE.L Tag(A1),A3 ; load new tag BRA Loop End: ---------------------------------------------------------------------------- 3. Type Guards and Type Tests Cuno Pfister In Oberon, indirectly referenced record variables (i.e. referenced by pointer or passed as VAR parameter) may have a different type at run-time than the one which is declared statically. An Oberon implementation must guarantee that the actual type of a record is an extension of the record's declared type. This is done with the aid of type guards [1]: A type guard tests whether the type of an indirectly referenced record variable is an extension of some statically declared type. If not, the program is aborted. A type test is similar to a type guard, but instead of aborting a program it returns the value FALSE, otherwise TRUE. We present a scheme which allows a very efficient implementation of type guards and type tests. A record type is represented at run-time by a type descriptor (see Figure). In this type descriptor there is a table of pointers (ttable), at a fixed offset and with a fixed size. The pointer in entry 0 points to the type descriptor of the original base type of the extension hierarchy (level 0 type). Entry 1 points to the first extension (level 1 type), entry 2 to the extension of the first extension, and so on. If the type descriptor denotes a level n type, the first n + 1 entries are used. The entries for higher level types are set to NIL. The level 0 entry may even be omitted, since a type guard on a base type is never executed at run-time. Nevertheless it is recommended to include it, for an example of where it can be useful see technical note # 4. Obviously the depth of the extension hierarchy is limited by such an arrangement. We recommend a table size of about 8 entries. This is thought to be large enough for all practical purposes. ttable type descriptor record p tag size ptable The generated code can be described as follows: the type guard v(T) becomes p := Tag(v); IF p^.ttable[L] # Tadr THEN HALT(18) END and the type test v IS T becomes p := Tag(v); RETURN p^.ttable[L] = Tadr where L is the extension's level, and Tadr the address of T 's type descriptor. T is a hidden global variable in the module which declares type T . Tag(v) is different for a record referenced via pointer and for a VAR parameter. The former contains the tag in the record itself, while the latter's tag is passed as an implicit parameter, together with the address of the record. A guard applied to a NIL valued pointer aborts the program. The WITH statement produces the same code as a type guard, the difference is only relevant for the compiler. A similar scheme has been presented in [1]. Hidden type guards are generated for the assignment of one record to another, indirectly referenced record variable. In this case the implementation must enforce strict equality of the record types on both sides of the assignment. This leads to a simpler type guard, namely: p := Tag(v); IF p # Tadr THEN HALT(19) END References 1. Cohen N H (1989), Type-Extension Type Tests Can Be Performed in Constant Time, IBM Research Report 2. Wirth N (1988), Type Extensions, ACM Trans. on Programming Languages and Systems, Vol. 10, No. 2, 204-214 ---------------------------------------------------------------------------- 4. A Symmetric Solution to the Load/Store Problem J. Templ 1.3.91 The problem of loading and storing polymorphic data structures from or to files using load and store messages is usually considered to be asymmetric because an object existing in memory can receive a store message but an object existing on a file cannot receive a load message. Nevertheless a symmetric solution for the problem is proposed. It is argued that a symmetric solution is both, more beautiful and more flexible. The key to a symmetric solution is in the separation of the information associated with an object into a header and a contents part. The header contains the type information and the contents part contains the data associated with the object. As the type information is not maintained by the object itself but by the class the object belongs to, it is straight forward to use class methods (ordinary procedures) to handle the type information and to use instance methods (type bound procedures or message handlers) to handle the contents of an object. In the proposed solution all classes handle the type information the same way, therefore one can think of the type handling methods as meta class methods. When dealing with extended types, the problem of loading and storing inherited state (possibly invisible to the extended type) arises. Using inherited load and store methods (super-calls) solves the problem but there is a subtle point to observe. When the type information is handled by the object itself instead of by the (meta) class, each overriding method is forced to use super calls. Also super calls must be done before any other data is stored onto the file. The more flexible symmetric solution follows postulate 1: "An object never stores its own type information as response to a store message" Loading an object o using a Rider R may be done by a procedure ReadObj(R, o) that generates an object according to the header information. Then the object's data can be loaded by sending a load message, e.g. o.Load(R). $ object = header contents. Storing an object o using a Rider R may be done by a procedure WriteObj(R, o). Storing the contents of the object may be done by sending a store message o.Store(R). Let T be a subtype of Object and T1 a subtype of T. Let Load and Store be procedures bound to T and Load' and Store' procedures bound to T1 overriding and invoking the inherited procedures. Storing of an object v with dynamic type T1 to a file is then done by the following steps: 1. WriteObj(R, v); 2. v.Store(R) invokes Store' 2.1. v.Store^(R) in Store' invokes Store 2.1.1. data associated with type T is stored 2.2. additional data associated with type T1 is stored Loading of an object of type T1 from a file into a variable v is done in symmetric steps: 1. ReadObj(R, v); 2. v.Load(R) invokes Load' 2.1. v.Load^(R) in Load' invokes Load 2.1.1. data associated with type T is loaded 2.2. additional data associated with type T1 is loaded ReadObj and WriteObj are responsible for internalizing and externalizing the object's type information. For internalized objects this information consists of a type tag, i.e. a pointer to a type descriptor node unique for a type. For externalized objects the type tags are mapped to reference numbers that refer to the type of the object. The first occurence of a type reference is followed by the externalized type descriptor consisting of the name of the module defining the type and the type's name. $ header = ref [module type]. $ ref = integer. $ module = char {char} 0X. $ type = char {char} 0X. For easy maintenance of the reference numbers of types, a ref field in the type descriptor is assumed. To avoid resetting this field before each "store session", a global virtual clock (store counter) is introduced and instead of the reference number in the type descriptors a time stamp is actually used. This time stamp contains the clock value of the type's last externalization. The reference number written to the file is the difference between the time stamp and the start time (clock0) of the stores which is defined by calling a Reset procedure. For "load sessions" a type table and a type counter has to be maintained which are also initialized by the same Reset procedure. For more details see the prototype implementation below. The use of the Reset procedure is restricted by postulate 2: "The initialization of the generic load/store mechanism must be symmetric" More accuratly, each Reset call preceding a store sequence must correspond to a Reset call preceding a load sequence and vice versa. Normally, the resets are done on the level of user activated commands. A module Files1 is assumed to support persistent data portable across different Oberon implementations. Files1 contains procedures for loading and storing basic types (integers, reals, ...) in a portable way and it contains procedures to load and store the empty object, i.e. it handles the dynamic type information associated with an object derived from Files1.Object. Properties of the proposed solution: - no install mechanism required - efficient externalization and internalization - no additional storage in internalized objects - compact external representation - easy to implement - low space overhead in type descriptors (time stamp plus type name) - flexible in the use of super-calls - pure Oberon (language) A prototype of module Files1 has been implemented under SPARC-Oberon: MODULE Files1; (* J.Templ, 24.2.91 *) IMPORT SYSTEM, Files, Kernel, Modules; TYPE Object* = POINTER TO ObjectDesc; ObjectDesc* = RECORD END ; TDesc = POINTER TO RECORD m: Kernel.Module; name: ARRAY 24 OF CHAR; time: LONGINT (* < clock *) END ; VAR module*, type*: ARRAY 24 OF CHAR; (* most recent internalized type *) clock, noftypes: LONGINT; (* clock0 = clock - noftypes *) typTab: ARRAY 256 OF LONGINT; ... PROCEDURE New(typetag: LONGINT): Object; ... END New; PROCEDURE ThisType(m: Kernel.Module; VAR type: ARRAY OF CHAR): LONGINT; ... END ThisType; PROCEDURE Reset*; BEGIN noftypes := 0 END Reset; PROCEDURE ReadObj* (VAR R: Files.Rider; VAR o: Object); VAR ref, tag: LONGINT; m: Kernel.Module; BEGIN Read(R, ref); IF ref = noftypes THEN ReadString(R, module); ReadString(R, type); m := Modules.ThisMod(module); IF m # NIL THEN tag := ThisType(m, type); IF tag # 0 THEN typTab[ref] := tag; INC(noftypes); o := New(typTab[ref]) ELSE R.res := 1 END ELSE R.res := 2 END ELSIF ref # -1 THEN o := New(typTab[ref]) ELSE o := NIL END END ReadObj; PROCEDURE WriteObj* (VAR R: Files.Rider; o: Object); VAR tag: TDesc; t: LONGINT; BEGIN IF o # NIL THEN SYSTEM.GET(SYSTEM.VAL(LONGINT, o)-4, t); tag := SYSTEM.VAL(TDesc, t - 36); IF tag.time < clock - noftypes THEN Write(R, noftypes); Files1.Write(R, noftypes); tag.time := clock; INC(noftypes); INC(clock); Files1.WriteString(R, tag.m.name); Files1.WriteString(R, tag.name) ELSE Write(R, tag.ref) ELSE Files1.Write(R, tag.time - (clock - noftypes)) END ELSE Write(R, -1) END END WriteObj; BEGIN clock := 1; noftypes := 0 END Files1. Example: loading and storing a binary tree using type bound procedures. TYPE Tree = POINTER TO TreeDesc; TreeDesc = RECORD (Files1.ObjectDesc) left, right: Tree END PROCEDURE (t: Tree) Load (VAR R: Files.Rider); BEGIN Files1.ReadObj(R, t.left); IF t.left # NIL THEN t.left.Load(R) END ; Files1.ReadObj(R, t.right); IF t.right # NIL THEN t.right.Load(R) END ; END Load; PROCEDURE (t: Tree) Store (VAR R: Files.Rider); BEGIN Files1.WriteObj(R, t.left); IF t.left # NIL THEN t.left.Store(R) END ; Files1.WriteObj(R, t.right); IF t.right # NIL THEN t.right.Store(R) END ; END Store; PROCEDURE StoreCmd*; ... Files1.Reset; Files1.WriteObj(R, t); IF t # NIL THEN t.Store(R) END END StoreCmd; PROCEDURE LoadCmd*; ... Files1.Reset; Files1.ReadObj(R, t); IF t # NIL THEN t.Load(R) END END LoadCmd; Note that an asymmetric solution which stores the type information of an object as response to the store message is not significantly shorter because NIL pointers have to be handled explicitly. It also has the disadvantage that the external representation of NIL has to be known (unless a special procedure that stores the value NIL has been introduced). PROCEDURE StoreCmd*; (* the asymetric solution *) ... Files1.Reset; IF t # NIL THEN t.Store(R) ELSE Files1.Write(R, 0) END END StoreCmd; The following presents the complete interface of module Files1 together with a short description of the external data representation. DEFINITION Files1; (*J.Templ 31.1.91*) (* module to support portable persistent data. ReadInt, WriteInt: 2 Byte integers, little endian byte ordering ReadLInt, WriteLInt: 4 Byte integers, little endian byte ordering ReadSet, WriteSet: 4 byte sets, little endian byte ordering, ORD({0}) = 1 ReadReal, WriteReal: 4 byte IEEE reals, little endian byte ordering ReadLReal, WriteLReal: 8 byte IEEE reals, little endian byte ordering ReadString, WriteString: arbitrary length, null terminated Read, Write: compact integers, 1 to 5 byte, cf. ETH Report 133, 1990 *) IMPORT Files; TYPE Object = POINTER TO ObjectDesc; ObjectDesc = RECORD END ; VAR module, type: ARRAY 24 OF CHAR; PROCEDURE Read (VAR R: Files.Rider; VAR i: LONGINT); PROCEDURE ReadInt (VAR R: Files.Rider; VAR i: INTEGER); PROCEDURE ReadLInt (VAR R: Files.Rider; VAR i: LONGINT); PROCEDURE ReadLReal (VAR R: Files.Rider; VAR r: LONGREAL); PROCEDURE ReadReal (VAR R: Files.Rider; VAR r: REAL); PROCEDURE ReadSet (VAR R: Files.Rider; VAR s: SET); PROCEDURE ReadString (VAR R: Files.Rider; VAR s: ARRAY OF CHAR); PROCEDURE ReadObj (VAR R: Files.Rider; VAR o: Object); PROCEDURE Write (VAR R: Files.Rider; i: LONGINT); PROCEDURE WriteInt (VAR R: Files.Rider; i: INTEGER); PROCEDURE WriteLInt (VAR R: Files.Rider; i: LONGINT); PROCEDURE WriteLReal (VAR R: Files.Rider; r: LONGREAL); PROCEDURE WriteReal (VAR R: Files.Rider; r: REAL); PROCEDURE WriteSet (VAR R: Files.Rider; s: SET); PROCEDURE WriteString (VAR R: Files.Rider; VAR s: ARRAY OF CHAR); PROCEDURE WriteObj (VAR R: Files.Rider; o: Object); PROCEDURE Reset; END Files1. ---------------------------------------------------------------------------- 5. Garbage Collection on Open Arrays J. Templ 3.3.91 Traditional garbage collectors for Oberon ignore the problem of traversing array structures on the heap by assuming that the compiler forbids such constructs (implementation restriction). However, there are good reasons for supporting pointers to arrays (fixed size and open arrays) with arbitrary element types and at the same time eliminating the implementation restriction. This paper proposes a solution for traversing array data structures that affects the inner loop of the garbage collector's mark phase in the common case of traditional record nodes only by two simple assignments when following a pointer (two register moves on most processors). The outer loop needs two additional bit tests (within registers). Although the mark phase can be formulated without nested loops, one can think of the highly time critical operations performed on each pointer of a node (skipping nil pointers, skipping pointers that point to marked blocks, and following a pointer) as the "inner loop". The "outer loop" contains all operations performed on each block, i.e. the inner loop and some operations when leaving the node. In other words, the inner loop is executed O(N) times, where N is the number of reachable pointers, and the outer loop is executed O(M) times, where M is the number of reachable blocks. It is obvious that N >= M and in some cases N >> M. Let T, Elem and P be types as defined below and v be a variable of type P. TYPE T = ARRAY n0, n1 .. ni-1 OF Elem; Elem = RECORD ... END ; P = POINTER TO T; The proposed algorithm assumes a storage block pointed to by v to look like this: v-4 tag points to Elem type descriptor v --> data points to the first array element size n0 * n1 * .. * ni-1 * SIZE(Elem) arrpos reserved for the garbage collector The type descriptor of an array block is the type descriptor of the element type which must be a record. Multi-dimensional arrays are "flattened", i.e. they are treated like big one-dimensional arrays. Arrays within records are not affected, i.e. they are still expanded in the type descriptor of the record. In addition to the existing block kinds SysBlk (a block allocated with SYSTEM.NEW) and RecordBlk (a block allocated with NEW), a new kind ArrayBlk is defined, i.e. now there are three different kinds of heap blocks. The mark phase of a garbage collector supporting only SysBlk and RecordBlk looks like the following procedure Mark that traverses all nodes reachable from the node pointed to by q. Mark expects the parameter q to point to a marked record block. PROCEDURE Mark(q: Pointer); VAR n: Pointer; BEGIN q.cnt := 0; LOOP IF Traversed(q) THEN Reset(q); IF StackEmpty THEN EXIT END ; Pop(q) ELSE Pointer(q, n); IF (n # NIL) & Unmarked(n) THEN SetMark(n); IF RecordBlk(n) THEN Push(q); q := n; q.cnt := -1 END END END ; INC(q.cnt) END END Mark; The meaning of the macros is as follows (some of them will be used later): RecordBlk(q), ArrayBlk(q), SysBlk(q), Unmarked(q) check a particular bit combination usually encoded in the type tag of q SetMark(q) set a bit combination in the type tag of q to signal that q is reachable Traversed(q) true iff all nodes reachable from q are marked. offset := Offset(q, q.cnt); RETURN offset < 0 Reset(q) resets some information temporarily encoded in the type tag of q StackEmpty true if stack of partially traversed nodes is empty Pointer(q, n) set n to the next son of q to be traversed. offset := q.tag.PtrTab[q.cnt]; n := mem[q+offset] Push(q) Push q onto the stack of partially traversed nodes. offset := Offset(q, q.cnt); mem[q+offset] := tos; tos := q Pop(q) Pop q from the stack of partially traversed nodes. offset := Offset(tos, tos.cnt); n := mem[tos+offset]; mem[tos+offset] := q; q := tos; tos := n Offset(q, n) the offset of the n-th pointer in block q Elemsize(q) the size of one array element. Elemsize(q) should be expandable to q.tag.size, i.e. the size of the element type should be available in the type descriptor. Most Oberon implementations currently round the record size available in the type descriptor to the next power of two or to the next number divisible by 16 or the like. In this case, the element size must be included in the array block needing some additional space. q.cnt the number of sons of q that are already traversed To include array blocks, we apply the mark algorithm iteratively to all array elements in the same way as it is done for records. For array blocks q.arrpos is used to hold the offset of the current array element, i.e. q.arrpos DIV Elemsize(q) holds the number of fully traversed array elements. q.cnt gives the number of sons of the element pointed to by q.arrpos that are already traversed. Mark now expects the parameter q to point to a marked record or array block. PROCEDURE Mark(q: Pointer); VAR n: Pointer; BEGIN IF ArrayBlk(q) THEN q.arrpos := 0 END; q.cnt := 0; LOOP IF Traversed(q) THEN Reset(q); IF ArrayBlk(q) & (q.arrpos + Elemsize(q) # q.size) THEN INC(q.arrpos, Elemsize(q)); q.cnt := -1 ELSIF StackEmpty THEN EXIT ELSE Pop(q) END ELSE Pointer(q, n); IF (n# NIL) & Unmarked(n) THEN SetMark(n); IF RecordBlk(n) OR ~SysBlk(n) THEN Push(q); q := n; q.cnt := -1 END END END ; INC(q.cnt) END END Mark; Unfortunately every macro that accesses a pointer in q needs to distinguish between Record and Array blocks now. For the Pointer macro the situation is like this: Pointer(q, n) = offset := Offset(q, q.cnt); IF ArrayBlk(q) THEN n := mem[q.data + q.arrpos + offset] ELSE n := mem[q + offset] END To avoid this distinction, we introduce an auxiliary variable t and an auxiliary invariant: H(t, q): <=> (RecordBlk(q) & t = q) OR (ArrayBlk(q) & t = q.data + q.arrpos) Pointer(q, t, n) = offset := Offset(q, q.cnt); n := mem[t + offset] Push(q, t) = offset := Offset(q, q.cnt); mem[t + offset] := tos; tos := q Pop(q, t) = offset := Offset(tos, tos.cnt); IF ArrayBlk(tos) THEN t := tos.data + tos.arrpos ELSE t := tos END; n := mem[t + offset]; mem[t + offset] := q; q := tos; tos := n t must be set appropriately when entering a node, but it remains unchanged while looping over nil pointers, marked pointers, or sysblks within a node. The overhead when pushing a record node is only a single assignment (shown in italics), the overhead for pushing an array node is one additional test and two assignments. The overhead for returning from a record node is two bit-tests and one assignment (shown in italics), for finishing array elements, another test for detecting the end of the array is needed. PROCEDURE Mark(q: Pointer); VAR n, t: Pointer; BEGIN IF ArrayBlk(q) THEN q.arrpos := 0; t := q.data ELSE t := q END ; q.cnt := 0; LOOP {H} IF Traversed(q) THEN Reset(q); IF ArrayBlk(q) & (q.arrpos + Elemsize(q) # q.size) THEN INC(q.arrpos, Elemsize(q)); q.cnt := -1; INC(t, Elemsize(q)) ELSIF StackEmpty THEN EXIT ELSE Pop(q, t) END ELSE Pointer(q, t, n); IF (n # NIL) & Unmarked(n) THEN SetMark(n); IF RecordBlk(n) THEN Push(q, t); q := n; t := q; q.cnt := -1 ELSIF ~SysBlk(n) THEN Push(q, t); q := n; q.arrpos := 0; t := q.data; q.cnt := -1 END END END ; INC(q.cnt) END END Mark; The modest additional complexity of the solution and the small runtime overhead for record blocks (shown in italics) seem to be justified by the additional functionality. In practice it turns out that a sophisticated encoding of the predicates ArrayBlk(q), SysBlk(q), RecordBlk(q), and Unmarked(q) has to be used to get a fast collector. The encoding used in a prototype implementation in SPARC-Oberon is explained below using an Oberon-like notation with relaxed typing rules, e.g. q.tag is sometimes used as a set, sometimes as a pointer, and sometimes as an integer. To avoid confusion, set operators are indexed with s. It is assumed that sets, pointers and integers have 4 bytes, and that bit i corresponds to value 2^i. Predicate Encoding Invariant Unmarked(q) ~(0 IN q.tag) RecordBlk(q) ~(1 IN q.tag) RecordBlk(q) => ~SysBlk(q) ArrayBlk(q) 1 IN q.tag SysBlk(q) 2 IN q.tag SysBlk(q) => ArrayBlk(q) free(q) 3 IN q.tag These encodings follow the rule that an allocated unmarked record block should not have any auxiliary bits set in the type tag, i.e. it should have a valid type tag without masking some bits first. This is important for fast type guards and type tests during program execution. For counting the number of sons already visited, the technique proposed by B. Heeb is used. Therefore the type tag is used as a pointer into the offset table, too. The pointer offsets are 4 byte integers, i.e. only the low order two bits of the tag can be used for Unmarked and RecordBlk. Fortunately, SysBlk is never used when traversing a node, as system blocks are supposed to have no pointers at all. free(q) is used by the scan phase of the collector. Using 4 bits in the type tag forces type descriptors to a 16 byte alignment. Note that the type tag can only be used as a pointer after masking the low order bits. In the following mem[p] means the 4 byte word at memory location p. PROCEDURE Mark(q: Pointer); VAR n, t, tos: Pointer; offset: LONGINT; BEGIN IF ArrayBlk(q) THEN q.arrpos := 0; t := q.data ELSE t := q END ; INC(q.tag, PtrTabOffset); tos := NIL; LOOP {H} offset := mem[q.tag - {0, 1}]; IF offset < 0 THEN INC(q.tag, offset); IF ArrayBlk(q) & (q.arrpos + Elemsize(q) # q.size) THEN INC(q.arrpos, Elemsize(q)); INC(q.tag, PtrTabOffset - 4); INC(t, Elemsize(q)) ELSIF tos = NIL THEN EXIT ELSE Pop(q, t) END ELSE n := mem[t + offset]; IF (n # NIL) & Unmarked(n) THEN INCL(n.tag, 0); IF RecordBlk(n) THEN Push(q, t); q := n; t := q; INC(q.tag, PtrTabOffset - 4) ELSIF ~SysBlk(n) THEN Push(q, t); q := n; q.arrpos := 0; t := q.data; INC(q.tag, PtrTabOffset - 4) END END END ; INC(q.tag, 4) END END Mark; The macros Push and Pop are now defined as: Push(q, t) = mem[t + offset] := tos; tos := q Pop(q, t) = offset := mem[tos.tag - {0, 1}]; IF ArrayBlk(tos) THEN t := tos.data + tos.arrpos ELSE t := tos END; r := mem[t + offset]; mem[t + offset] := q; q := tos; tos := r This procedure may be implemented using assembly language or "pseudo Oberon" (Oberon + module SYSTEM). However, there are still some improvements possible. The repeated access to mem[q.tag - {0, 1}] in the inner loop can be accelerated by introducing an auxiliary variable tag initialized with q.tag - {0, 1}. In this case, q.tag has to be updated whenever a node is left. This update operation and the bit tests before Pop may be optimized by introducing another variable qtag with qtag = q.tag * {0,1}. There are also some common subexpressions that could be eliminated (by a compiler). PROCEDURE Mark(q: Pointer); VAR n, t, tos: Pointer; offset, tag: LONGINT; qmask, ntag: SET; BEGIN IF 1 IN q.tag THEN q.arrpos := 0; t := q.data; qmask := {0, 1} ELSE t := q; qmask := {0} END; tag := q.tag - {0, 1} + PtrTabOffset; tos := NIL; LOOP {H} offset := mem[tag]; IF offset < 0 THEN q.tag := tag + offset + qmask; IF 1 IN qmask & (q.arrpos + Elemsize(q) # q.size) THEN INC(q.arrpos, Elemsize(q)); INC(tag, offset + PtrTabOffset - 4); INC(t, Elemsize(q)) ELSIF tos = NIL THEN EXIT ELSE qmask := tos.tag; tag := qmask - {0, 1}; qmask := qmask * {0, 1}; IF 1 IN qmask THEN t := tos.data + tos.arrpos ELSE t := tos END; offset := mem[tag]; n := mem[t + offset]; mem[t + offset] := q; q := tos; tos := n END ELSE n := mem[t + offset]; IF (n # NIL) THEN ntag := n.tag; IF ~(0 IN ntag) THEN q.tag := tag + qmask; n.tag := ntag + {0}; IF ~(1 IN ntag) THEN mem[t + offset] := tos; tos := q; q := n; t := q; tag := ntag + PtrTabOffset - 4; qmask := {0} ELSIF ~(2 IN ntag) THEN mem[t + offset] := tos; tos := q; q := n; q.arrpos := 0; t := q.data; tag := ntag - {1} + PtrTabOffset - 4; qmask := {0, 1} END END END END; INC(tag, 4) END END Mark; The overhead for record nodes is shown in italics. On modern processors it consists of about two machine cycles when following a pointer to a record node and six cycles when leaving a record node. Switching from one array element to the next needs one additional compare, two storage reads, one storage write, and three additions (15 - 20 cycles).